Clone the git repository.
$ git clone https://github.com/ontop/vig.git
In order to build the project, simply run the following bash script.
$ ./build.sh
from the vig folder.
If the build script ran successfully, you should be able to observe an output as follows:
[SCRIPT] This is your config information:
[SCRIPT] resources folder: ${HOME}/resources
[SCRIPT] configuration file: ${HOME}/resources/configuration.conf
[SCRIPT] csvs folder: ${HOME}/resources/csvs
The configuration.conf file contains the configuration of the generator. This file looks as follows:
# ====================
# Mandatory parameters
# ====================
jdbc-connector jdbc:mysql # The only connector supported, at the moment.
database-url <addr:port/dbName> # The address, port, and name of the source database.
database-user <user> # The username for the access to the source database.
database-pwd <pwd> # The password for the access to the source database.
# VIG generation mode. Either DB or OBDA (default). Since VIG 1.1, OBDA mode is preferred.
# The NPD Benchmark, v1.8.0 onwards, should be run in OBDA mode.
# OBDA mode reads also statistics from the mappings, and supports fixed-domain columns.
mode <DB|ODBA>
random-gen [true|false] # If true, then the generator will behave as a pure
# random generator. DB-mode only.
obda-file <path/mappings.obda> # The location of the mapping file in .obda format.
# IMPORTANT: Connection parameters should be set
# also in this file. OBDA-mode only.
scale <value> # Scaling factor value. Default: 1.0
# ======================================================================================
# Advanced parameters. Commented out, as usually the defaults values are enough. You can
# check what the default values are set to by running VIG with the --help option.
# ======================================================================================
# fixed <table1.col1> <table2.col2> ... # Manually specified fixed-domain columns. OBDA-mode only.
# non-fixed <table1.col1> <table2.col2> ...# Manually specified non fixed-domain columns. OBDA-mode only.
# Time (ms) allowed for the columns-cluster analysis. Given a columns cluster {A,B,C} of columns A, B,
# and C,VIG tries to compute the cardinality of all possible intersections between these three columns,
# namely AB, AC, BC and ABC. If timeout is reached, say, while evaluating cardinality of the intersection
# ABC, then such cardinality is assumed to be zero (hence, no value will be generated in the intersection
# of the columns A,B, and C). OBDA-mode only.
# cc-timeout <value>
# tables <table1> <table2> ... # Generate only the specified tables.
# columns <table1.col1> <table2.col2> ... # Generate only the specified columns.
For a complete list of the vig command-line options, please refer to the help utility by running
$ java -jar vig.jar --help
or
$ java -jar vig.jar --help-verbose
Command-line options override the parameters provided through the configuration file.
After having set up the configuration file, simply execute the jar
$ java -jar vig.jar
1) Launch the build script from the vig directory:
./build.sh
2) Create the desired source database, e.g.
$ mysql --user="username" --host="address" --password="password" npdSource < npd-data-dump.sql
3) Set up the configuration file in resources/configuration.conf.
jdbc-connector jdbc:mysql
database-url db_host/db_name
database-user user
database-pwd pwd
mode OBDA
obda-file resources/npd-v2-ql_a.obda
scale 2
In our configuration, we have set up the scaling factor to 2 through the parameter scale 2.
5) If we have set the mode to OBDA, as in our example, then we need to put mappings in the location we specified in the configuration file (resources/npd-v2-ql_a.obda in our example). Morover, we need to set up the connection parameters in the mappings file:
[SourceDeclaration]
sourceUri http://sws.ifi.uio.no/vocab/npd-v2
connectionUrl jdbc:mysql://db_host/db_name
username user
password pwd
driverClass com.mysql.jdbc.Driver
4) We are now ready to run VIG, specifying the location of the resources folder.
$ java -jar vig.jar --res="resources"
5) The csv files will be generated in the directory resources/csvs.
6) Import the csv files into the RDBMS.